A Search in the Forest: Efficient Algorithms for Parsing and Machine Translation based on Packed Forests A DISSERTATION PROPOSAL in Computer and Information Science

نویسنده

  • Liang Huang
چکیده

Many problems in Natural Language Processing (NLP) involves an efficient search for the best derivation over (exponentially) many candidates. For example, a parser aims to find the best syntactic tree for a given sentence among all derivations under a grammar, and a machine translation (MT) decoder explores the space of all possible translations of the source-language sentence. In these cases, the concept of packed forest provides a compact representation of huge search spaces by sharing common sub-derivations, where efficient algorithms based on Dynamic Programming (DP) are possible. Building upon the hypergraph formulation of forests and well-known 1-best DP algorithms, this dissertation develops fast and exact k-best DP algorithms on forests, which are orders of magnitudes faster than previously used methods on state-of-theart parsers. We also show empirically how the improved output of our algorithms has the potential to improve results from parse reranking systems and other applications. We then extend these algorithms to approximate search when the forests are too big for exact inference. We discuss two particular instances of this new method, forest rescoring for MT decoding, and forest reranking for parsing. In both cases, our methods perform orders of magnitudes faster than conventional approaches. In the latter, faster search also leads to better learning, where our approximate decoding makes whole-Treebank discriminative training practical and results in an accuracy better than any previously reported systems trained on the Treebank. Finally, we discuss the work to be completed, which are mainly further investigations of forest reranking, and its applications to sequence labeling, dependency parsing, and other problems such as machine translation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Forest-based Algorithms in Natural Language Processing

FOREST-BASED ALGORITHMS IN NATURAL LANGUAGE PROCESSING Liang Huang Supervisors: Aravind K. Joshi and Kevin Knight Many problems in Natural Language Processing (NLP) involves an efficient search for the best derivation over (exponentially) many candidates. For example, a parser aims to find the best syntactic tree for a given sentence among all derivations under a grammar, and a machine translat...

متن کامل

A Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)

Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...

متن کامل

A Model for Detecting of Persian Rumors based on the Analysis of Contextual Features in the Content of Social Networks

The rumor is a collective attempt to interpret a vague but attractive situation by using the power of words. Therefore, identifying the rumor language can be helpful in identifying it. The previous research has focused more on the contextual information to reply tweets and less on the content features of the original rumor to address the rumor detection problem. Most of the studies have been in...

متن کامل

Machine learning algorithms for time series in financial markets

This research is related to the usefulness of different machine learning methods in forecasting time series on financial markets. The main issue in this field is that economic managers and scientific society are still longing for more accurate forecasting algorithms. Fulfilling this request leads to an increase in forecasting quality and, therefore, more profitability and efficiency. In this pa...

متن کامل

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008